An automated approach for abstracting execution logs to execution events

نویسندگان

  • Zhen Ming Jiang
  • Ahmed E. Hassan
  • Gilbert Hamann
  • Parminder Flora
چکیده

ed Log Lines Categorize Bins Figure 3. High-level overview of our approach for abstracting execution logs to execution events. Table III. Log lines used as a running example to explain our approach. 1. Start check out 2. Paid for, item=bag, quality=1, amount=100 3. Paid for, item=book, quality=3, amount=150 4. Check out, total amount is 250 5. Check out done Copyright q 2008 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. 2008; 20:249–267 DOI: 10.1002/smr AN AUTOMATED APPROACH FOR ABSTRACTING EXECUTION LOGS 257 Table IV. Running example logs after the anonymize step. 1. Start check out 2. Paid for, item=$v, quality=$v, amount=$v 3. Paid for, item=$v, quality=$v, amount=$v 4. Check out, total amount=$v 5. Check out done Table V. Running example logs after the tokenize step. Bin names (no. of words, no. of parameters) Log lines (3,0) 1. Start check out 5. Check out done (5,1) 4. Check out, total amount=$v (8,3) 2. Paid for, item=$v, quality=$v, amount=$v 3. Paid for, item=$v, quality=$v, amount=$v 4.2.2. The tokenize step The tokenize step separates the anonymized log lines into different groups (i.e., bins) according to the number of words and estimated parameters in each log line. The use of multiple bins limits the search space of the following step (i.e., the categorize step). The use of bins permits us to process large log files in a timely fashion using a limited memory footprint since the analysis is done per bin instead of having to load up all the lines in the log file. We estimate the number of parameters in a log line by counting the number of generic terms (i.e., $v). Log lines with the same number of tokens and parameters are placed in the same bin. Table V shows the sample log lines after the anonymize and tokenize steps. The left column indicates the name of a bin. Each bin is named with a tuple: number of words and number of parameters that are contained in the log line associated with that bin. The right column in Table VI shows the log lines. Each row shows the bin and its corresponding log lines. The second and the third log lines contain 8 words and are likely to contain 3 parameters. Thus, the second and third log lines are grouped together in the (8,3) bin. Similarly, the first and last log lines are grouped together in the (3,0) bin since they both contain 3 words and are likely to contain no parameters. 4.2.3. The categorize step The categorize step compares log lines in each bin and abstracts them to the corresponding execution events. The inferred execution events are stored in an execution events database for future references. The algorithm used in the categorize step is shown below. Our algorithm goes through the log lines Copyright q 2008 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. 2008; 20:249–267 DOI: 10.1002/smr 258 Z. M. JIANG ET AL. Table VI. Running example logs after the categorize step. Execution events (word parameter id) Log lines 3 0 1 1. Start check out 3 0 2 5. Check out done 5 1 1 4. Check out, total amount=$v 8 3 1 2. Paid for, item=$v, quality=$v, amount=$v 8 3 1 3. Paid for, item=$v, quality=$v, amount=$v bin by bin. After this step, each log line should be abstracted to an execution event. Table VI shows the results of our working example after the categorize step. for each bin bi for each log line lk in bin bi for each execution event e(bi , j) corresponding to bi in the events DB perform word by word comparison between e(bi , j) and lk if (there is no difference) then lk is of type e(bi , j) break end if end for // advance to next e(bi , j) if ( lk does not have a matching execution event) then lk is a new execution event store an abstracted lk into the execution events DB end if end for // advance to the next log line end for // advance to the next bin We now explain our algorithm using the running example. Our algorithm starts with the (3,0) bin. Initially, there are no execution events that correspond to this bin yet. Therefore, the execution event corresponding to the first log line becomes the first execution event namely 3 0 1. The 1 at the end of 3 0 1 indicates that this is the first execution event to correspond to the bin, which has 3 words and no parameters (i.e., bin 3 0). Then the algorithm moves to the next log line in the (3,0) bin, which contains the fifth log line. The algorithm compares the fifth log line with all the existing execution events in the (3,0) bin. Currently, there is only one execution event: 3 0 1. As the fifth log line is not similar to the 3 0 1 execution event, we create a new execution event 3 0 2 for the fifth log line. With all the log lines in the (3,0) bin processed, we can move on to the (5,1) bin. As there are no execution events that correspond to the (5,1) bin initially, the fourth log line gets assigned to a new execution event 5 1 1. Finally, we move on to the (8,3) bin. First, the second log line gets assigned with a new execution event 8 3 1 since there are no execution events corresponding to this bin yet. As the third log line is the same as the second log line (after the anonymize step), the third log line is categorized as the same execution event as the second log Copyright q 2008 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. 2008; 20:249–267 DOI: 10.1002/smr AN AUTOMATED APPROACH FOR ABSTRACTING EXECUTION LOGS 259 line. Table VI shows the sample log lines after the categorize step. The left column is the abstracted execution event. The right column shows the line number together with the corresponding log lines. 4.2.4. The reconcile step Since the anonymize step uses heuristics to identify dynamic information in a log line, there is a chance that we might miss to anonymize some dynamic information. The missed dynamic information will result in the abstraction of several log lines to several execution events that are very similar. Table VII shows an example of dynamic information that was missed by the anonymize step. The table shows five different execution events. However, the user names after ‘for user’ are dynamic information and should have been replaced by the generic token ‘$v’. All the log lines shown in Table VII should have been abstracted to the same execution event after the categorize step. The reconcile step addresses this situation. All execution events are re-examined to identify which ones are to be merged. Execution events are merged if: 1. They belong to the same bin. 2. They differ from each other by one token at the same positions. 3. There exists a few of such execution events. We used a threshold of five events in our case studies. Other values are possibly based on the content of the analyzed log files. The threshold prevents the merging of similar yet different execution events, such as ‘Start processing’ and ‘Stop processing’, which should not be merged. Looking at the execution events in Table VII, we note that they all belong to the ‘5 0’ bin and differ from each other only in the last token. Since there are five of such events, we merged them into one event. Table VIII shows the execution events from Table VII after the reconcile step. Note that if the ‘5 0’ bin contains another execution event: ‘Stop processing for user John’; it will not be merged with the above execution events since it differs by two tokens instead of only the last token. Table VII. Sample logs that the categorize step would fail to abstract. Event IDs Execution events 5 0 1 Start processing for user Jen 5 0 2 Start processing for user Tom 5 0 3 Start processing for user Henry 5 0 4 Start processing for user Jack 5 0 5 Start processing for user Peter Table VIII. Sample logs after the reconcile step. Event IDs Execution events 5 0 1 Start processing for user $v Copyright q 2008 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. 2008; 20:249–267 DOI: 10.1002/smr 260 Z. M. JIANG ET AL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Mapreduce to Scale Events Correlation Discovery for Business Processes Mining

 Using Mapreduce to scale events correlation discovery for business processes mining Hicham Reguieg, Farouk Toumani, Hamid Reza Motahari Nezhad, Boualem Benatallah HP Laboratories HPL-2012-170 business processes; Event Correlation; map reduce The volume of data related to business process execution is increasing significantly in the enterprise. Many of data sources include events related to th...

متن کامل

An integrated simulation-DEA approach to multi-criteria ranking of scenarios for execution of operations in a construction project

The purpose of this study is to examine different scenarios for implementing operations in the pre-construction phase of a project, based on several competing criteria with different importance levels in order to achieve a more efficient execution plan. This paper presents a new framework that integrates discrete event simulation (DES) and data envelopment analysis (DEA) to rank different scena...

متن کامل

Fuzzy gain scheduling of PID controller for stiction compensation in pneumatic control valve

Inherent nonlinearities like, deadband, stiction and hysteresis in control valves degenerate plant performance. Valve stiction standouts as a more widely recognized reason for poor execution in control loops. Measurement of valve stiction is essential to maintain scheduling. For industrial scenarios, loss of execution due to nonlinearity in control valves is an imperative issue that should be t...

متن کامل

Soccer Goalkeeper Task Modeling and Analysis by Petri Nets

In a robotic soccer team, goalkeeper is an important challenging role, which has different characteristics from the other teammates. This paper proposes a new learning-based behavior model for a soccer goalkeeper robot by using Petri nets. The model focuses on modeling and analyzing, both qualitatively and quantitatively, for the goalkeeper role so that we have a model-based knowledge of the ta...

متن کامل

Concept drift detection in business process logs using deep learning

Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Software Maintenance

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2008